week 7: multilevel models

multilevel adventures

more than one type of cluster

McElreath doesn’t cover this in his video lecture, but this is from the textbook and worth discussing.

data(chimpanzees, package="rethinking")
d <- chimpanzees
str(d)
'data.frame':   504 obs. of  8 variables:
 $ actor       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ recipient   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ condition   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ block       : int  1 1 1 1 1 1 2 2 2 2 ...
 $ trial       : int  2 4 6 8 10 12 14 16 18 20 ...
 $ prosoc_left : int  0 0 1 0 1 1 1 1 0 0 ...
 $ chose_prosoc: int  1 0 0 1 1 1 0 0 1 1 ...
 $ pulled_left : int  0 1 0 0 1 1 0 0 0 0 ...
unique(d$actor)
[1] 1 2 3 4 5 6 7
unique(d$block)
[1] 1 2 3 4 5 6
unique(d$prosoc_left)
[1] 0 1
unique(d$condition)
[1] 0 1

We could model the interaction between condition (presence/absence of another animal) and option (which side is prosocial), but it is more difficult to assign sensible priors to interaction effects. Another option, because we’re working with categorical variables, is to turn our 2x2 into one variable with 4 levels.

d$treatment <- factor(1 + d$prosoc_left + 2*d$condition)
d %>% count(treatment, prosoc_left, condition)
  treatment prosoc_left condition   n
1         1           0         0 126
2         2           1         0 126
3         3           0         1 126
4         4           1         1 126

In this experiment, each pull is within a cluster of pulls belonging to an individual chimpanzee. But each pull is also within an experimental block, which represents a collection of observations that happened on the same day. So each observed pull belongs to both an actor (1 to 7) and a block (1 to 6). There may be unique intercepts for each actor as well as for each block.

Mathematical model:

\[\begin{align*} L_i &\sim \text{Binomial}(1, p_i) \\ \text{logit}(p_i) &= \bar{\alpha} + \alpha_{\text{ACTOR[i]}} + \bar{\gamma} + \gamma_{\text{BLOCK[i]}} + \beta_{\text{TREATMENT[i]}} \\ \beta_j &\sim \text{Normal}(0, 0.5) \text{ , for }j=1..4\\ \alpha_j &\sim \text{Normal}(0, \sigma_{\alpha}) \text{ , for }j=1..7\\ \gamma_j &\sim \text{Normal}(0, \sigma_{\gamma}) \text{ , for }j=1..7\\ \bar{\alpha} &\sim \text{Normal}(0, 1.5) \\ \bar{\gamma} &\sim \text{Normal}(0, 1.5) \\ \sigma_{\alpha} &\sim \text{Exponential}(1) \\ \sigma_{\gamma} &\sim \text{Exponential}(1) \\ \end{align*}\]

m3 <- 
  brm(
    family = bernoulli,
    data = d, 
    bf(
      pulled_left ~ a + b, 
      a ~ 1 + (1 | actor) + (1 | block), 
      b ~ 0 + treatment, 
      nl = TRUE),
    prior = c(prior(normal(0, 0.5), nlpar = b),
              prior(normal(0, 1.5), class = b, coef = Intercept, nlpar = a),
              prior(exponential(1), class = sd, group = actor, nlpar = a),
              prior(exponential(1), class = sd, group = block, nlpar = a)),
  chains=4, cores=4, iter=2000, warmup=1000,
  seed = 1,
  file = here("files/data/generated_data/m71.3")
  )
m3
 Family: bernoulli 
  Links: mu = logit 
Formula: pulled_left ~ a + b 
         a ~ 1 + (1 | actor) + (1 | block)
         b ~ 0 + treatment
   Data: d (Number of observations: 504) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~actor (Number of levels: 7) 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(a_Intercept)     2.04      0.66     1.12     3.63 1.01     1424     2076

~block (Number of levels: 6) 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(a_Intercept)     0.21      0.17     0.01     0.63 1.00     1587     1660

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a_Intercept      0.58      0.71    -0.76     2.01 1.00     1136     1789
b_treatment1    -0.13      0.30    -0.71     0.46 1.00     2102     2948
b_treatment2     0.40      0.30    -0.20     0.99 1.00     1820     2576
b_treatment3    -0.48      0.30    -1.06     0.11 1.00     1937     2509
b_treatment4     0.28      0.30    -0.29     0.89 1.00     1910     2502

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
posterior_summary(m3)
                             Estimate Est.Error          Q2.5        Q97.5
b_a_Intercept            5.835058e-01 0.7051870 -7.574382e-01    2.0133561
b_b_treatment1          -1.285035e-01 0.3000948 -7.111530e-01    0.4608178
b_b_treatment2           3.968713e-01 0.2977210 -2.006941e-01    0.9864687
b_b_treatment3          -4.770727e-01 0.3000809 -1.057934e+00    0.1067693
b_b_treatment4           2.815642e-01 0.2984428 -2.944552e-01    0.8881553
sd_actor__a_Intercept    2.036780e+00 0.6561266  1.116480e+00    3.6306757
sd_block__a_Intercept    2.088196e-01 0.1743707  8.311932e-03    0.6295848
r_actor__a[1,Intercept] -9.391636e-01 0.7061219 -2.360315e+00    0.3928665
r_actor__a[2,Intercept]  4.108310e+00 1.3648897  2.040292e+00    7.2360335
r_actor__a[3,Intercept] -1.245105e+00 0.7113142 -2.669128e+00    0.1290459
r_actor__a[4,Intercept] -1.245720e+00 0.7152689 -2.676802e+00    0.1138210
r_actor__a[5,Intercept] -9.385328e-01 0.7167749 -2.405589e+00    0.4292875
r_actor__a[6,Intercept]  1.203465e-03 0.7156267 -1.441203e+00    1.3720687
r_actor__a[7,Intercept]  1.526423e+00 0.7600257  4.334721e-02    3.0016306
r_block__a[1,Intercept] -1.664840e-01 0.2169600 -7.073766e-01    0.1194874
r_block__a[2,Intercept]  3.467096e-02 0.1698465 -2.979211e-01    0.4263189
r_block__a[3,Intercept]  4.830452e-02 0.1783436 -2.839705e-01    0.4701920
r_block__a[4,Intercept]  1.138782e-02 0.1732665 -3.519582e-01    0.3951538
r_block__a[5,Intercept] -2.934102e-02 0.1689017 -4.049284e-01    0.3082666
r_block__a[6,Intercept]  1.078886e-01 0.1934904 -1.977606e-01    0.5785745
lprior                  -6.336549e+00 1.2258524 -9.231991e+00   -4.5017996
lp__                    -2.866412e+02 3.7297778 -2.947833e+02 -280.1872116
m3 %>% 
  mcmc_plot(variable = c("^r_", "^b_", "^sd_"), regex = T) +
  theme(axis.text.y = element_text(hjust = 0))
Code
as_draws_df(m3) %>% 
  select(starts_with("sd")) %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(x = value, fill = name)) +
  geom_density(linewidth = 0, alpha = 3/4, adjust = 2/3, show.legend = F) +
  annotate(geom = "text", x = 0.67, y = 2, label = "block", color = "#5e8485") +
  annotate(geom = "text", x = 2.725, y = 0.5, label = "actor", color = "#0f393a") +
  scale_fill_manual(values = c("#0f393a", "#5e8485")) +
  scale_y_continuous(NULL, breaks = NULL) +
  ggtitle(expression(sigma["group"])) +
  coord_cartesian(xlim = c(0, 4))

exercise

Return to the data(Trolley) from an earlier lecture. Define and fit a varying intercepts model for these data, with responses clustered within participants. Include action, intention, and contact. Compare the varying-intercepts model and the model that ignores individuals using both some method of cross-validation.

solution

data(Trolley, package="rethinking")

# fit model without varying intercepts
m_simple <- brm(
  data = Trolley,
  family = cumulative, 
  response ~ 1 + action + intention + contact, 
  prior = c( prior(normal(0, 1.5), class = Intercept) ),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/data/generated_data/m71.e1")
)

# fit model with varying intercepts
m_varying <- brm(
  data = Trolley,
  family = cumulative, 
  response ~ 1 + action + intention + contact + (1|id), 
  prior = c( prior(normal(0, 1.5), class = Intercept),
             prior(normal(0, 0.5), class = b),
             prior(exponential(1), class = sd)),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/data/generated_data/m71.e2")
)

solution

# compare models using WAIC cross-validation
m_simple  <- add_criterion(m_simple , "loo")
m_varying <- add_criterion(m_varying, "loo")

loo_compare(m_simple, m_varying, criterion = "loo") %>% 
  print(simplify=F)
          elpd_diff se_diff  elpd_loo se_elpd_loo p_loo    se_p_loo looic   
m_varying      0.0       0.0 -15669.2     88.7       354.2      4.6  31338.4
m_simple   -2876.0      86.2 -18545.1     38.1         9.2      0.0  37090.3
          se_looic
m_varying    177.5
m_simple      76.2
pp_check(m_simple, ndraws = 5, type="hist") +
  ggtitle("Simple Model")
pp_check(m_varying, ndraws = 5, type="hist") +
  ggtitle("Varying Intercepts Model")

predictions

Posterior predictions in multilevel models are a bit more complicated than single-level, because the question arises: predictions for the same clusters or predictions for new clusters?

In other words, do you want to know more about the chimps you collected data on, or new chimps? Let’s talk about both.

predictions for chimps in our sample

Recall that the function fitted() give predictions.

labels <- c("R/N", "L/N", "R/P", "L/P")

nd <- distinct(d, treatment, actor) %>% 
  mutate(block=1)

f <- fitted(m3,newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  mutate(treatment = factor(treatment, labels = labels))
Code
f %>% 
  ggplot( aes(x=treatment, y=Estimate, group=1) ) +
  geom_ribbon(aes( ymin=Q2.5, ymax=Q97.5 ), 
              fill = "#0f393a",
              alpha=.3) +
  geom_line(color="#0f393a") +
  scale_y_continuous(limits=c(0,1)) +
  facet_wrap(~actor)
Code
# observed p
obs = d %>% 
  filter(block==1) %>% 
  group_by(actor, treatment) %>% 
  summarise(p = mean(pulled_left), .groups = "drop") %>% 
  mutate(treatment = factor(treatment, labels = labels))


f %>% 
  ggplot( aes(x=treatment, y=Estimate, group=1) ) +
  geom_ribbon(aes( ymin=Q2.5, ymax=Q97.5 ), 
              fill = "#0f393a",
              alpha=.3) +
  geom_point( aes(y=p), 
              data=obs, 
              shape=1) +
  geom_line(color="#0f393a") +
  facet_wrap(~actor)

We can add in the observed probabilities.

predictions for new chimps

Even here, we have some choice. Let’s start by predicting scores for the average chimp.

post <- as_draws_df(m3)
avg_chimp = post %>% select(starts_with("b_")) %>% 
  pivot_longer(-b_a_Intercept) %>% 
  mutate(
    fitted = b_a_Intercept + value,
    fitted = inv_logit_scaled(fitted),
    treatment=factor(str_remove(name, "b_b_treatment"),
                     labels=labels)
  ) %>% 
  group_by(treatment) %>% 
  median_qi(fitted)
avg_chimp
# A tibble: 4 × 7
  treatment fitted .lower .upper .width .point .interval
  <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 R/N        0.613  0.295  0.866   0.95 median qi       
2 L/N        0.725  0.409  0.918   0.95 median qi       
3 R/P        0.526  0.228  0.820   0.95 median qi       
4 L/P        0.701  0.381  0.908   0.95 median qi       
Code
f %>% 
  ggplot( aes(x=treatment, y=Estimate, group=1) ) +
  geom_ribbon(aes( ymin=Q2.5, ymax=Q97.5 ), 
              fill = "#0f393a",
              alpha=.3) +
  geom_point( aes(y=p), 
              data=obs, 
              shape=1) +
  geom_line(color="#0f393a") +
  geom_ribbon(aes( x=treatment, ymin=.lower, ymax=.upper ), 
              alpha=.3,
              fill="#e07a5f",
              data=avg_chimp,
              inherit.aes = F) +
  geom_line( aes(y=fitted), data=avg_chimp, color ="#e07a5f") +
  geom_line(color="#0f393a") +
  facet_wrap(~actor)

But the average chimp is only one possible chimp we could encounter. Let’s simulate 100 possible chimps.

Code
post %>% 
  slice_sample(n=100) %>% 
  # simulate chimps
  mutate(a_sim = rnorm(n(), mean = b_a_Intercept, sd = sd_actor__a_Intercept)) %>% 
  pivot_longer(b_b_treatment1:b_b_treatment4) %>% 
  mutate(fitted = inv_logit_scaled(a_sim + value)) %>% 
  mutate(treatment = factor(str_remove(name, "b_b_treatment"),
                            labels = labels)) %>%
  ggplot(aes(x = treatment, y = fitted, group = .draw)) +
  geom_line(alpha = 1/2, color = "#e07a5f") +
  coord_cartesian(ylim = 0:1) 

exercise

Returning to the Trolley data and the varying intercept model, get predictions for…

  1. a subset of 3 participants in the dataset.
  2. the average participant.
  3. 2 new participants.

Hint: don’t forget that the model uses a link function. You may need to play with arguments or fiddle around with the outputs of your functions.

solution

3 participants

Code
part3 = sample( unique(Trolley$id) , size=3, replace=F )
nd <- distinct(Trolley, action, intention, contact, id) %>% 
  filter(id %in% part3)

f <- fitted(m_varying, newdata = nd, scale = "response") %>% 
  data.frame() %>% 
  bind_cols(nd) 

f %>% 
  pivot_longer(-c(action:id),
               names_sep = "\\.{3}",
               names_to = c("stat", "response")) %>% 
  pivot_wider(names_from = stat, values_from = value) %>% 
  mutate(response = str_sub(response, 1, 1)) %>% 
  ggplot(aes(x=response, y=Estimate.P.Y, fill=as.factor(intention))) +
  geom_bar(stat="identity", position="dodge") +
  labs(y="p") +
  facet_grid(action+contact~id) +
  theme(legend.position = "bottom")

solution

The average participant

Code
nd <- distinct(Trolley, action, intention, contact) 

f <- fitted(m_varying, newdata = nd, scale = "response", 
            re_formula = NA) %>% 
  data.frame() %>% 
  bind_cols(nd) 

f %>% 
  pivot_longer(-c(action:contact),
               names_sep = "\\.{3}",
               names_to = c("stat", "response")) %>% 
  pivot_wider(names_from = stat, values_from = value) %>% 
  mutate(response = str_sub(response, 1, 1)) %>% 
  ggplot(aes(x=response, y=Estimate.P.Y, fill=as.factor(intention))) +
  geom_bar(stat="identity", position="dodge") +
  labs(y="p") +
  facet_grid(action~contact) +
  theme(legend.position = "bottom")

Two new participants

Code
# create data for 2 new participants
nd <- distinct(Trolley, action, intention, contact) %>%
  slice(rep(1:n(), times = 2)) %>%
  mutate(id = rep(c("New1", "New2"), each = n()/2))

# get predictions including random effects
f <- fitted(m_varying, newdata = nd, 
            scale = "response", allow_new_levels=T) %>% 
  data.frame() %>% 
  bind_cols(nd) 

# plot
f %>% 
  pivot_longer(-c(action:id),
               names_sep = "\\.{3}",
               names_to = c("stat", "response")) %>% 
  pivot_wider(names_from = stat, values_from = value) %>% 
  mutate(response = str_sub(response, 1, 1)) %>% 
  ggplot(aes(x=response, y=Estimate.P.Y, fill=as.factor(intention))) +
  geom_bar(stat="identity", position="dodge") +
  labs(y="p", fill="intention") +
  facet_grid(action+contact~id) +
  theme(legend.position = "bottom")